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Abstract —Given an nnknovrn signal xo € R” and linear 
noisy measurements y = Axo + crv £ R’", the generalized 
fi-LASSO solves x argmirix IHy — Ax||i + aXf{x.). Here, 
/ is a convex regularization function (e.g. fi-norm, nuclear- 
norm) aiming to promote the structure of xo (e.g. sparse, lovr- 
rank), and, A > 0 is the regularizer parameter. A related 
optimization problem, though not as popular or well-known, 
is often referred to as the generalized £ 2 -LASSO and takes the 
form X argmirix ||y~ Ax ||2 + A/(x), and has been analyzed 
in [1]. [1] further made conjectures about the performance of 
the generalized -LASSO. This paper establishes these conjec¬ 
tures rigorously. We measure performance with the normalized 
squared error NSE((j) |]x — xoHi/cr^. Assuming the entries 
of A and v be i.i.d. standard normal, we precisely characterize 
the “asymptotic NSE” aNSE ;= lima--»o NSE(cr) when the 
problem dimensions m, n tend to infinity in a proportional 
manner. The role of A, / and xo is explicitly captured in the 
derived expression via means of a single geometric quantity, the 
Gaussian distance to the subdifferential. We conjecture that 
aNSE = sup^^o NSE(ct). We Include detailed discussions on 
the interpretation of our result, make connections to relevant 
literature and perform computational experiments that validate 
our theoretical findings. 

I. Introduction 
A. Generalized LASSO 

The Generalized ff'LASSO has emerged as a powerful 
tool for the recovery of structured signals (sparse, low 
rank, etc.) from linear noisy measurements in a variety of 
applications in statistics, signal processing, machine learning, 
etc.. Given an unknown signal Xq € K" and measurements 
y = Axo + z e K™, it solves: 

X := argmin(l/2)||y - Ax||| + crA/(x). (1) 

X 

Here, / is a convex regularization function, typically non¬ 
smooth (e.g. £i-norm, nuclear-norm, £i/f2-norm), aiming to 
promote the structure of Xq (e.g. sparse, low-rank, block- 
sparse). A > 0 is the regularizer parameter and is scaled 
with the standard deviation a of the noise vector z, which is 
typically modeled to have entries i.i.d. Af(0, u^). The term 
“LASSO” was coined by Tibshirani [2] who first introduced 
Q with / chosen as the ^i-norm. In this view, 0 is a natural 
generalization to other structures and convex regularizers. We 
have added the indicator “£2” to distinguish 0 from a variant 
which takes the form [1], [3]: 

X := argmin||y-A x|| 2-l-).(/(x). (2) 

X 

We call this the Generalized £2-LASSO, but it is also known 
in related literature (e.g. [3]) as the square-root LASSO. The 
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two optimizations in 0 and 0 are fundamentally related: 
from optimality conditions there exists a mapping between 
the regularizer parameters A and p, for which the performance 
is equivalent. However, not only is this mapping non-trivial 
to characterize, but also there exist other differentiating fea¬ 
tures. For instance, note that in 0 the regularizer parameter 
/i need not scale (thus is agnostic) with the noise variance [1], 
[3]. A comparison between the two algorithms is beyond the 
scope of the paper, but our result, when combined with those 
of [1], inevitably results in some further related discussions 
in the next sections. In what follows, we often drop the 
attribute “Generalized” and simply refer to 0 and 0 as 
the £|-LASSO and £2-LASSO, respectively. 

B. Performance Analysis and Related Literature 

A natural measure of performance of 0 or 0 is the 
Normalized Squared Error NSE := |jx — XoUf/o"^. To fa¬ 
cilitate the theoretical analysis of the NSE, it is standard 
to assume that the measurement matrix A is drawn at 
random from some ensemble. Early well-known bounds on 
the NSE were order-wise in nature (i.e. accurate only up 
to constant multiplicative factors) and derived based on RIP 
and Restricted Eigenvalue assumptions on the measurement 
matrix [3]-[7]. To the best of our knowledge, the first precise 
formulae predicting the limiting behavior of the £|-LASSO 
reconstruction error were provided by Donoho, Maleki, 
and Montanari [8]; a proof appeared lated by Bayati and 
Montanari in [9]. The authors of these references consider 
the £|-LASSO with £1-regularization, i.i.d Gaussian sensing 
matrix A and use the Approximate Message Passing (AMP) 
framework for the analysis (also see subsequent related 
works [10], [11]). More recently, Stojnic [12] introduced an 
alternative framework and used it to derive a tight upper 
bound on the NSE of the following constrained version of 
the LASSO: 

min||y-Ax||2 s.t. ||x|| 1 < ||xo|l. (3) 

X 

Stojnic’s approach cleverly uses a comparison lemma due 
to Gordon [13] , known as the Gaussian min-max Theorem 
(GMT). What allowed him to use this machinery in the first 
place was the observation that 0 can be equivalently 
expressed as a min-max problem as follows: 

min max (y — Ax) s.t. ||x||i < ||xo||i. (4) 

X |[u||<l 

It turns out that this form is appropriate for the application 
of GMT. The same idea was used in [1] to generalize the 
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results of [12] to arbitrary convex regularizer functions in 
However, the main contribution of [1] is the extension 
of the results to the generalized £ 2 -LASSO. The presence of 
the regularizer parameter A in ([^ makes the extension non¬ 
trivial and considerable effort had to be undertaken in [1]. 
Of course, the same observation that allows the use of the 
GMT in the first place, is here the same as in (|^, namely 
can be expressed as 

min max u^(y — Ax) + p/(x). (5) 

X |[u||<l 

At that time it wasn’t clear to the authors of [1] how to 
leverage the objective function in Q and analyze the NSE of 
the LASSO under the same machinery. However, making 
an “educated guess” on the formula that governs the mapping 
between the two versions of the LASSO, they were able to 
translate results from © to Q. This led them to conjecture 
a formula for the upper bound on the NSE of the f^-LASSO, 
which was also suggested by numerical simulations. 

C. Our Contribution 

In this work, we rigorously establish the conjecture raised 
in [1] on the NSE of the Generalized f|-LASSO under 
i.i.d. Gaussian measurements. Instead of worrying about the 
mapping function between Q and (0 and translating the 
results from the latter to the former, we follow a direct 
approach. The key observation is that the objective function 
in 0 can be appropriately linearized for the purpose of using 
the GMT, and be written equivalently as; 

minmaxu^(y — Ax) — (l/2)||u||^ + Acr/(x). 

X U 

Beyond this trick, what facilitates our analysis is a result 
from [15]. Essentially, [15] builds a clear, concrete and 
easy to apply framework based on Stojnic’s original idea 
of combining GMT with convexity. This allows a more 
insightful and compact analysis when compared to [1], [12]. 


H. Result 

A. Setup 

Let Xq S i?", y = Axq + crv £ K"* and convex 
/ : M” ^ M. The f|-LASSO solves 0 for A > 0. The 
reconstruction vector x depends explicitly on A, A, a, /, and, 
implicitly on v, Xq through the measurement vector y. Define 
the Normalized Squared-Error of 0 as 

NSE(cr) := ||x - xolla/cr^. (6) 


1) Assumptions: We assume that the entries of A and v 
are i.i.d. JV{0, 1). The regularizer / : M" —K is convex 
and continuous. Also, xq is not a minimize!' of /. Popular 
regularizers include the £i-norm, nuclear-norm, fi ,2 — norm 
etc. (please refer to [16], [17] for further examples). 

2) Large system limit: Our results hold in an asymptotic 
regime in which the problem dimensions grow to infinity. We 
consider a sequence of problem instances {A, v, Xq,/} m,ra 
as in 0 indexed by m and n such that both m, n —> c». In 
each problem instance. A, v and / satisfy the assumptions 
of Section I II- A. 1 Furthermore, x and NSE(cr) denote the 
output of 0 and the corresponding NSE. To keep notation 


simple, we avoid introducing explicitly the dependence of 
variables on the problem dimensions to, n. 


3) NSE: worst-case and asymptotic: Define the worst- 
case NSE as wNSE := sup^^QNSE( ct). We say that 
recovery of Xq by means of 0 is robust whenever wNSE < 
oo. Further, define the asymptotic NSE as aNSE := 
lim^-^o NSE((t). Theorem |2.1| in Section II-B derives a 
precise expression for aNSE in the large system limit. 
In Section II-C we conjecture that under our assumptions 
aNSE = wNSE, which highligths the significance of study¬ 
ing the aNSE. Recent results [1], [8], [12], [18], [19], have 
shown that wNSE is also achieved in the limit ^ 0 
for algorithms of nature similar to 0 under similar setups. 
Please also refer to relevant discussion (on the similarly 
defined notion of noise-sensitivity) in [20]. 


4) Gaussian Squared Distance: 

The subdifferential of / at Xq is the set of vectors: 

9/(xo) = {s G K”|/(xo -t-u) > /(xo) -f s'^u,Vu G K”} . 
It is nonempty, convex and compact [21]. Also, it does 
not contain the origin (recall Xq is not a minimizer). For 
any nonnegative number r > 0, denote the scaled (by r) 
subdifferential set as t9/(xo) = {rs|s G i9/(xo)}. Also, 
for the conic hull of the subdifferential 9/(xo), write 
cone(c)/(xo)) = {s|s G tc)/(xo), for some r > 0}. For 
C C M" nonempty, convex, closed set and u G K", denote 
the projection and distance as 7rc(h) := argmiuggc ||u~s ||2 
and dist(u,C) = ||u - 7rc(h)||2. 

Definition 2.1 (Gaussian squared distance): Assume / : 
K" —K convex. Let h G K" have i.i.d A/’(0,1) entries. 
The gaussian squared distance to the scaled subdifferential 
is defined as 


D{t) := := Eh [dist^(h, A5/(xo))] . (7) 

D{t) appears as a fundamental quantity in the study of the 
phase transitions of noiseless compressive sensing: it has 
been shown that 

TO > mini9(r) « E[dist^(h,cone(c)/(xo))]. (8) 

T>0 

is sufficient [16], [22] and necessary [17] for the recovery of 
Xq from noiseless linear observations. Thus, it is no surprise 
that the properties of D{t) have been analyzed in detail 
in [17, Lem. C.2] (also, [1, Lem. 8.1]). The same quantity 
plays central role in the analysis of the noisy case consid¬ 
ered here; we make precise reference to relevant properties 
whenever they appear useful throughout our exposition. For 
the statement of our results, we need the following; D{t) is 
differentiable for t > 0 and dD(T)/dT = — (2/t)C(t), 

C'(t) := Eh [(h - ■ (9) 

To familiarize with the definitions in 0 and 0, it is 
instructive to specialize to the case where / is the ii- 
norm and Xq is a fc-sparse vector. Then, ()/(xo) has a 
simple characterization and D{t), C{t) admit simple closed- 
form expressions in terms of the tail distribution Q{t) of a 







standard Gaussian (e.g., [1, App. H] ): 


D{t) = k{l + t) + (n - fc)(2(l + T )Q{t) - \l -re 


C'(r) = + (n — k){2T^Q{T) — \j —re 


-) 


( 10 ) 


B. Result 

Recall the definitions of D{t) and C'(r) in 0 and 

1) Regime of operation: Our results hold in the asymp¬ 
totic linear regime, where m,n and D{t) all grow to 
infinity such that m/n —>■ i5 € (0, oo) and (1 — e)m > 
min^>o D{t) > em for constant e > 0. The assumption 
m > minT->o D{t) is motivated by 0. 

2) Preliminaries: 

Definition 2.2 ('map); Let TZ := {t > 0|m — D{t) > 
max{0, C(t)}} and define map : TZ —> (0,oo) : 

m — D{t) — C{t) 


map(T) := r- 


( 11 ) 


y/m — D{t 

The next lemma shows that the inverse of map is well 
defined. 

Lemma 2.1 fmap“^, [1]): Assume m > minT >oD{t). 
Then, TZ is a nonempty open interval and map is strictly 
increasing, continuous and bijective. In particular, its inverse 
function map“^ : (0,oo) —^ 7^ is well defined. 

3) Theorem: Recall the assumptions of Section |II-A.l 


Assume a large system setup as in Section II-A.2| under 


the linear regime. Theorem |2.1| characterizes the limiting 
behavior of the asymptotic normalized squared etTor of 0. 
Theorem 2.1: Fix any A > 0 in 0 and let 


aNSE := lim NSE(cr) = lim 

cr-S-O cr-S-O 

The following limit holds in probability 
I7(map“^(A)) 


l|X-Xo||^ 


lim aNSE = 
n->-oo TO — I7(map ^(A)) 


=: T]{\). 


C. Remarks 


1) The role of the parameters: Theorem 2.1 explicitly 
captures the role of the number of measurements to, the 
regularizer /, the unknown signal xq and the regularizer 
parameter A. The dependence on the ambient dimension n 
is implicit through Xq. 

2) The mapping: The theorem maps the regularizer pa¬ 
rameter A > 0 to some value t G IZ through map“^. Note 
that IZ is nonempty as long as to > min,- D{t) (Lemma [2. 1| . 
Figure illustrates the action of map“^ for an instance of 
a sparse recovery problem. 

3) Geometric nature: The structure induced by /, the 
particular xq we are trying to recover and the value of A are 
all summarized in a single parameter, namely, the gaussian 
squared distance to the subdifferential. 


4) Generality: In principle. Theorem 2.1 holds for any 
convex regularizer /. Thus, it applies to any signal class that 
exhibits some sort of low-dimensionality. In this sense, it 
extends to the noisy case the unifying treatment of convex 
regularizers, which has been adopted in the analysis of 
noiseless compressive sensing [16], [17]. 


■< 


m/n = 0.5, k/n = 0.05 



A 


Fig. ll Illustration of the region 7Z and of the map function (Defn. 2.2) 
for / = II • 111 and xq E a /c-sparse vector. map~^ maps the value of 
the regularizer A in ^ to a value in 7^. D{r) and C{r) are computed as 
in (T^ 


m/n = 0 . 5 , k/n = 0 . 05 , n = 260 



Fig. 2: Numerical validation of Theorem |2.l| for / = || • ||i and xq E 
a A;-sparse vector. Measured values of N^E((t) are averages over 50 
realizations of A.v. The theorem accurately predicts NSE(c7) as cr ^ 0. 
Results support our claim that aNSE = avNSE. Xbest is the value of the 
optimal regularizer as predicted by Lemma 2.2. 


5) On the worst-case NSE: We conjecture that 
wNSE := supNSE((j) = lim NSE((7) =: aNSE. (12) 

cr>0 <^“^0 


Theorem 2.1 would then imply that for any cr > 0: 


lim NSE(cr) < r]{X)^ 

n —>-00 

in probability. There are several reasons that suggest this 
claim. First, wNSE = aNSE has already been shown to 
hold for algorithms similar to 0 such as: i) the constrained 
generalized LASSO in 0, [1], [8], [12], ii) the proximal 
denoiser [18], [19], which is essentially 0 when m — n 
and A = I„. Furthermore, our conjecture is supported by 
computational experiments; see Figure]^ and [1, Sec. 13]. 

6) Evaluating the bound: Evaluating the bound of Theo¬ 
rem 2.1 for particular instances of structures and regularizers 


requires the ability to compute D{t) and C{t). It is impor¬ 
tant to note that this only requires knowledge of the particular 
structure of the unknown signal Xq, and not the explicit 
unknown signal itself. For example, in sparse recovery, D{X) 
is the same for all fc-sparse signals (see ( [Tol l Fig. 0. 










































7) Optimal tuning: Thm. 2.1 suggests a simple recipe for 
finding the optimal value Abest of the regularizer parameter. 


Lemma 2.2: Recall 77 (A) as dehned in Theorem 2.1 Let 
Abest := arg minA> o 7?(A) and T best := arg min.r>o-D(r). 
Then, Abest — Ttest\/^ .^(h^est)’ 

The proof of the lemma is not involved and is omitted for 
brevity. It is shown in [17, Lem. C.2] that D{t) is strictly 
convex. Thus, Tbest can be efficiently calculated as the unique 
solutions to a convex program. This determines Abest- Note 
that even though calculating Abest does not require explicit 
knowledge of Xq itself, it does assume knowledge of the 
particular structure. For instance, in sparse recovery we need 
to know the sparsity level k (see Fig. |^. 

8) Phase-transitions: Combining Theorem 2.1 with 
Lemma it holds with probability one that. 


lim min 
(y —^0 A>0 


|X-Xo|i2 


min,- D{t) 
i — min^- D(t) 


In view of the wNSE conjecture in (|2T|, the quantity in the 
left hand side can be viewed as the minimax NSE of G- 
LASSO for a fixed signal Xq. While m > min,-Z1 (t), we 
can always tune 0 to guarantee robust recovery. However, as 
the number of measurements m approaches minT- D{t), then, 
even after optimal tuning, the NSE grows to 00 . This phase- 
transition characterizing the robustness of 0 is identical to 
0, i.e. the phase-transition in noiseless compressed sensing. 
This observation was first formally predicted in [8], and, later 
proved in [9] and [12], for / = || • ||i and xq k-sparse. 

9) Robustness: Theorem |2. 1 1 reveals the following inter¬ 
esting feature of 0. Given sufficient number of measure¬ 
ments m > min^- D{t), the recovery is robust for all choices 
of the regularizer parameter A > 0. In particular, this is 
in contrast to the £ 2 -LASSO in 0. It was shown in [1], 
[23] that the NSE of the later becomes unbounded if the 
regularizer parameter is larger than some Amax- 

10) Relevant literature: Most error bounds derived in the 
literature for 0 are order-wise. The hrst precise results 
were derived in the context of sparse recovery via the AMP 
framework: [8] develops formal expressions for the wNSE 
of 0 under optimal tuning of the regularizer parameter 
A > 0; [9] explicitly characterize NSE((t) for all values 
of A > 0 and all tr > 0. The rest of the works that 
we list here use the GMT framework. [1], [12] precisely 
characterizes the wNSE of 0. [1] computes the aNSE of 
0. The NSE((t) of 0 with -regularization but arbitrary 
a > 0 has been characterized by the authors in [24]. Theorem 


2.1 characterizes the aNSE of the generalized ^|-LASSO. 


III. Proof Outline 

We outline the main steps of the proof here. Most of 
the technical details are deferred to the Appendix. Before 
everything, we re-write 0 by changing the decision variable 
to be the error vector w = x — xq: 

w := min i|| Aw — tTv|j 2 + — /(xq + w). (13) 

w 2 a 


Theorem 2.1 states a precise expression for the limiting 
behavior lim^-j-o llw|l^/cr^. Throughout the analysis, we hx 
any A > 0. Also, we simply write |j • || instead of || • II 2 . 


A. First-order Approximation 

We start with a useful approximation to ( [T3| l. The idea is 
that in the regime of interest we expect w to scale linearly 
with a. Thus, in the limit cr —>■ 0, ||w|| is sufficiently small 
such that /(xq -f w) « /(xq) -f supsggj(xo) s^w. Note that 
this always holds with a “>” sign due to convexity. What 
we show in the Appendix is essentially that introducing this 
approximation in ( [T3] l does note alter ||w|| in the limit cr —0. 

B. Gaussian min-max Theorem 

We get a handle on ( [T3] ) and its optimal value via an¬ 
alyzing a different and simpler optimization problem. The 
machinery that allows this relies on Gordon’s Gaussian min- 
max theorem (GMT) [13, Lem. 3.1]. In fact, we require 
a stronger version of the GMT that can be obtained when 
accompanied with additional convexity assumptions that are 
not present in its original formulation. The fundamental idea 
is attributed to Stojnic [12]. [15] builds upon this and derives 
a concrete and somewhat extended statement of the result in 
[15, Thm. II.l]. Please refer to the discussion in [15] for 
further details on the GMT, the role of convexity, and, the 
differences between [13, Lem. 3.1], [12] and [15, Thm. II.l]. 
We summarize the result of [15, Thm. II.l] in the next few 
lines. Let G G g IR™,h G M" have entries i.i.d. 

Gaussian; C K",5b C K™ be convex compact sets, and 
7 /: : 5a X 5b — i" K be convex-concave and continuous. Eurther 
consider the following two min-max problems: 

4>(A) := min max b^Ga-f'0(a, b), (14) 


^(g, h) := min max Hajlg’^b — |jb||h’^a -f '!/i’(a, b). (15) 
aeSa beSb 

Then, for any /i G K, f > 0: 

P (|$(A) - p.\>t) < 2P (|(()(g, h)- p,\>t). 

Thus, if the optimal cost (/)(g, h) of ( fTS] ) concentrates to some 
value fi, the same is true for 4)(A). This suggests analyzing 
instead of 0, and indirectly yield conclusions for the 
latter. The premise is that the optimization in is easier 
to analyze; we often refer to it as “Gordon’s optimization” 
following [15]. Assuming a setup in which the problem 
dimensions m,n grow to inhnity it is shown in [15] that 
if h) converges in probability to deterministic value d*, 
then, so is $(G). What is more, if ||a*(g,h)|| converges to 
say a*, and some appropriate strong convexity assumption on 
the objective function of ( [T5] l is satisfied, then ||a*(G)|| also 
converges to a*. Here, we have denoted a*(g, h), a*(G) for 
the minimizers in GD and 0, respectively; refer to [15] 
and Lemma [L^ for the exact statements. As might be already 
suspected, this latter property is of interest to our problem. In 
what follows, we bring 0 in the format of ( [T4l i, derive the 
corresponding “Gordon’s optimization” problem and analyze 
the minimizer of that one instead. 










C. Gordon’s Optimization 

We use the fact that (l/2)||a|p = maxb b^a— (l/2)||b|p 
to equivalently express w as the solution to (also, recall the 
first-order approximation) 

minmaxb'^Aw — trb'^v — (l/2)||b||^-f A max s^w. 

w b s69/(xo) 

Identify ^/>(w,b) = — crb^v—i|jb|p-|-AmaxsS^w, which 
is convex-concave and continuous, to see that the above is in 
the desired format ( [T4| ). The only caveat is that the constraint 
sets on w and b appear unbounded. This is appropriately 
treated in the Appendix and we do not elaborate any further 
here. The corresponding “Gordon’s optimization” problem 
writes: 

min max ||w||g^b — ||b||h^w — crb’^v-||b|p -I- Xs^w, 

w s.b 2 

where the variable s is constrained in 9/(xo), but we omit to 
shorten notation. In the next lines, we show how to simplify 
this optimization to a scalar problem. Recall that g, v both 
have entries i.i.d. Af{Q, 1) and are independent of each other; 
thus ||w||g-f trv has entries A/'(0, -y||wP + cr^). Also, note 
that the maximization over the direction of b is easy to 
perform, since maxyby^^ g^b = /3||g||, /3 > 0. With these, 
and some abuse of notation so that g continues being i.i.d 
standard normal gaussian, we may rewrite the above as: 

min max v/j|w]p^T^||g||/? — {Ph — As)^w — /3^/2. 

w s,/3>0 

Observe that the objective function is now convex in w and 
concave in [3, s. Thus, modulo compactness of the constraint 
sets (see Appendix for details), we can flip the order of 
minimization and maximization [21, Cor. 37.3.2], and write: 

max min \/||w|P + cr2||g||/3 — (^h — As)^w — P'^/2. 

s,/3>0 w 

But, now, it is easy to perform the minimization over the 
direction of w. Doing this, and letting a represent ||w||: 

max min \Jo? ^ cr^ljgll/? — Q;/3||h — ^s|j — ^5^/2. 
s,/3>0 a>0 p 

We are almost done with the simplifications. One last step 
amounts to flipping the order of min-max once more (the 
objective is appropriately concave-convex) and performing 
the maximization over s, which results in the appearance of 
the distance term below: 

min max \/a^~+^|]g||/3 — adist(h, ^c)/(xo)) — /3^/2. 
ct>0 /3>o p 

(16) 


D. Analysis of Gordon’s Optimization 

In ( [Thl l, the variable a plays the role of |jw|j. Thus, from 
the discussion in Section |III-B| if we find the value to 
which the optimal Q!»(g, h) in ([T^ converges, then, we may 
conclude that the desired quantity |lw(A, v)|| also converges 
to the same value. This will establish Theorem 12.11 Assume 
the asymptotic regime that holds for Theorem 2.1 We only 
highlight the main ideas here and defer most of the details 
to the Appendix. 


Both functions ||g|| and dist(h, (A//3)9/(xo)) in ( [T6| ) are 
1-Lipschitz in their arguments. Then, the classical gaussian 
concentration of Lipschitz functions implies that they con¬ 
centrate around yfn and yjD{\/jd), respectively (e.g. [1, 
Lem. B.2]). We use this in the appendix to prove that ( [Th] ) 
converges in probability (after proper normalization) to 

min max \/a 2 + cr2v^/3 - aPy/D{X/j3) - p‘^/2. (17) 

Q;>0 /3>0 

Moreover, the minimizer of ([T^ converges to the minimizer 
a* of the deterministic minimization in ([T7|. To compute 
a*, we use duality (the objective is (strictly) convex in a 
and concave in /3). First, fix jS, differentiate the objective in 
w.r.t. a and equate to 0 to find that is minimized at 

a,(/3) = ^^D{Xm/Vm-D{X//3). 

Substituting this value back in ( fTT) ! and differentiating now 
with respect to /3, yields the optimal /3* = A/map“^(A). 
Note that a*(/3*) agrees with the expression of the theorem 
to conclude. 
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Appendix 


In Section!^ we outlined the proof of Theorem 2.1 Here, 
we provide a complete proof of the theorem. 

A. Preliminaries 

We rewrite Q in a more convenient format for the 
purposes of the analysis. In particular, (i) substitute y = 
Axq + CTV, (iii) subtract from the objective the constant 
term A/(xo), (ii) change the decision variable to the quan¬ 
tity of interest, i.e. the normalized error vector w^- := 
(1 /(t)(x — xq), (iv) rescale by a factor of cr^. Then, 

:= min i||Aw„ - v|j| + -(/(xq + crw,^) - /(xq)). 

Wcr Z (7 

( 18 ) 

We will derive a precise expression for the limiting (as 
n oo) behavior of limcr_>.o ||Wo-|i 2 - Note that after the 
normalization of x — xq with cr , it is not guaranteed that 
the optimal minimizer in (fTS]) is bounded (think of cr —0). 


However, we will prove that in the regime of Theorem 2.1 


this is indeed the case. Many of the arguments that we use 
in the analysis require boundedness of the constraint sets. 
To tackle this, we assume that Wg. is bounded by some large 
constant K > 0 (with probability one over A, v), the value 
of which to be chosen at the end of the analysis. Recall that 
at that point we will have a precise characterization of the 
limiting behavior of ||wct|| 2 . say a*. If a* turns out to be 
independent on the value of K which we started with, then 
we will assume that this starting value was strictly larger 
than a*. Thus, in what follows, we let K, A, M denote such 
(arbitrarily) large, but finite, positive quantities. For K, which 
is reserved as an upper bound on ||w£r|j we assume that is 
constant in the sense that it does not scale with n. This will 
be required when we apply [15, Thm. II.2] in Section |p] On 
the other hand. A, M are in general allowed to depend on 
n. Also, we fix A > 0 and write || • || instead of 11 • lU. 


B. Gordon’s Optimization for arbitrary a 
We use the fact that 

(l/2)||af = maxb^a- (l/2)||bf, (19) 

b 

to equivalently express as the solution to 

min max b^Aw^ -b^v- -llblP 
l!w„i|<K||bit<A 2 " 

+ -(/(xo + <J^a) - /(xo)). 
a 

In view of ( [T9| ) and the boundedness of w^, the set of optima 
of b is also bounded by some 0 < A := A{K) < oo. This 
brings ( [T 8 ] l in the desired format ( [T4| ). Then, ( [T5] l writes 

W£r(g,h):= min max vTw~p~+Tg^b - ||b||h^Wo. 

Ilw.||<if ||b||<A" 

- I|b|| V2 + (A/ct)(/(xo -f aw^) - /(xq)), 

The maximization over the direction of b is easy to perform; 
note that max||b ||=/3 g^b = ^||g|| 2,/3 > 0. Also, / is 
continuous and convex, thus, we can express it in terms 
of its convex conjugate /*(u) = sup^u^x — /(x). In 
particular, applying [21, Thm.12.2] we have /(xo + crWo-) = 
supu Xq u+CTU^Wcr—/*(u). The supremum here is achieved 
at u* G c)/(xo -I- (tWct) [21, Thm. 23.5]. Also, from [25, 
Prop. 4.2.3], U||w,,||<a'9/(xo +ctWo.) is bounded. Thus, the 
set of maximizers u* is bounded and for some 0 < M := 
M{K) < 00 , Wo- is given as the solution to 

<;i(cr;g,h):= min max \/||wa|p + l|lg||/3 -/3V2 
|!w<,||<if 0</3<A 
0<!|u||<M 

- (^h - Au)^w^ + -(u'^Xq - /*(u) - /(xq)). ( 20 ) 
a 

C. Gordon’s Optimization in the limit cr —> 0 

[15, Thm. II.2] relates ||wo.|| to |lwcr||, under appropri¬ 
ate assumptions. Also, recall that we wish to characterize 
limcr-fo ||'Wcr||. Thus, in view of ( |20| ) we wish to analyze the 
problem 

(^0 :=()>o(g,h) := lim (/)(cr; g, h). 

cr —>0 

In ( |20l i, from Fenchel’s inequality: 

u'^xq -/*(u) -/(xq) < 0 . ( 21 ) 

With this observation, we prove in the next lemma that 
g, h) is non-decreasing in cr; see Section|^for the proof. 

Lemma 1.1: Fix g, h and consider (;ii(-;g, h) : (0,oo) —> 
K as defined in 0 (cr;g, h) is non-decreasing in cr. 

In particular, when viewed as a function of k := A/cr, 
(;i(-;g, h) is non-increasing. Thus, 

00 = lim 0 (cr) = lim 0 (k) = inf 0 (k), ( 22 ) 

cr —>-0 K—>-oo K >0 

Next, we argue that we can flip the order of min-max; we 
will apply [21, Cor. 37.3.2]. The objective function in ( |20| 
is continuous, convex in both and, concave both in 






b, u. The constraint sets are all convex and one of them is 
bounded. With this and (|2^, we get 


<^o(g,h) 


max min inf v w^. p + 1 g /3 

0</3<A |lw.,||<rf k;>0 ^ 

0<||u||<M 


2 


- {j3h - Au)^w^ + k(u^xo - /*(u) - /(xo)). (23) 


Recall ( |2 T| i and the fact that equality is achieved iff u G 
9/(xo) (e.g. [21, Thm. 23.5]). Then, 0o(g)h) is given by 

,1 + l||gll^ - (/3h - 

o</3<A |!w„||<rf 2 

uGa/(xo) 

where we have assumed oo > M > maxgggjjxo) ||s||. 
We can simplify this one step further by performing the 
minimization over the direction of Wc. In the problem below 
note that a plays the role of |iwo-||. Thus, (/)o(g, h) = 
max o</3<A mino<a<ic + l||g||/3-a||/3h-Au|j - 

uG9/(xo) 

The objective function above is continuous, convex in a 
and concave in /3, u. Also the constraint sets are convex 
and bounded. Thus, [21, Cor. 37.3.2], we can flip the order 
of max-min. Also, for /3 > 0, minugg^(xo) ll/^h — Au|| = 
/3dist(h, (A//?)9/(xo)). With these, normalizing with 1/m 
and appropriately rescaling /3: 

^o(g,h) = min max £(«,/?; g,h) := 

0<a<K 0</3<A 

\/a2 + 1-^^ - a/3^dist(h, -^=9/(xo)) - (24) 

y/m y/m p^/m 2 

Normalization here is convenient for the purposes of apply¬ 
ing statement (hi) of [15, Thm. II.2], which follows. 


D. Applying [15, Thm. II.2] 

( |24l l describes “Gordon’s optimization” (modulo normal¬ 
ization with m) corresponding to ( [T8] l in the limit of cr —> 0. 
Also, the variable a in ( [24l i plays the role of ||vvct||. The idea 
is that problem ( [24| ) behaves in the large system limit just 
like ([^. This is formalized in the lemma below, which is a 
direct corollary of [15, Thm. II.2] applied to our setup and 
combined with our analysis thus far. For the statement of 
the lemma recall that we are operating in the large-system 
limit in which problem dimensions n, m, minT->o Dir) grow 
linearly to infinity (cf. Section|II-B.l|). Also, we use standard 

P ' 

notation —> c, to denote convergence in probability of 

{Xn}'^=i to c G K as n —>• OO. 

Lemma 1.2: Let Wcr and Costo- be a minimizer and the 
optimal cost of ( fTS) !, respectively. Also, recall ( |24| ). Suppose 
d : [0,oo) —>• K is such that maxo<^<A£(a,/3; g, h) 
d[a) for all a G [0, K], and d{a) > d(a*)-|-^(a—Va G 
\0,K], for some a* G [0, AT] and (/ > 0. Then, 


lim ||w „||2 

(T-fO 


and lim 


Costo 


CT->-0 CT'^TO 


d(a*). (25) 

. li 

Recall Wo- = (x—Xo)/fT, thus, |lwcr|p = NSE((t). In what 
follows, we construct deterministic function d that satisfies 
the requirements of Lemma 
the formula of Theorem 12.1 


1.2 and prove that a* satisfies 


This will complete the proof. 


To suppress notation, define (A/g) := X/{l3y/rn). 


Both functions |jg|| and dist(h, (A^)£)/(xo)) are 1- 
Lipschitz in their arguments. Then, the classical gaussian 
concentration of Lipschitz functions implies that they con¬ 
centrate around ^/m and -y/ D{\p), respectively 

From standard concentration results on Lipschitz functions 
of gaussian r.v.s. (e.g. [1, Lem. B.2]), we have for all a,j3\ 

/:(a,^;g,h) A J(a,/?) := sj 

\ m 2 
(26) 


As we have seen C{a, /3; g, h) is convex in a and concave in 
P. Since taking limits preserves convexity, the same is true 
for d{a,(3). Next, define 


d(a) := max d(a,B). 
0</3<A 


We claim that this satisfies the prerequisites of Lemma 1.2 


First, we show the convergence part. It suffices to prove 
that for each a the convergence in ( |26l l holds uniformly over 
all /3 G [0, A]. Concavity of £(a, /3) in its second argument is 
critical. In particular, the claim follows from [26, Cor. II. 1]: 
“point-wise convergence in probability of concave functions 
implies uniform convergence in compact spaces”. 

Next, we compute appropriate a*. Consider 


(a*./3*) := arg min max d{a, 6) (27) 

0<a<K0</3<A 


We compute those in the next lemma; see Section for a 
proof 

Lemma 1.3: Consider the optimization in ([27li. Let 


a* 


Zl(map“i(A)) 


Il(map“i(A)) 


and /3* 


A 

map“i(A)-v/m 


Then, there exist K, A satisfying 0 < a* < AT < oo and 
0 < /3* < A < oo, such that (a*,/3») are optimal in ( |27| ). 

Let K, A in be as in Lemma [m It remains to prove 
d{a) > d(a*) + C('A —for some </ > 0. Fix a G [0, AT]. 
Clearly, 


d(a) = ma.xd{a, /3) > d{a,(3„). 


We will now use the fact that for fixed /3 G (0,A], the 
function d{a, (3) is strongly convex in 0 < a < AT. Indeed, 

d'^djdo? = -f 1)3/2 > 

Recall /3* > 0 and let ^ -f l)^/^ > 0. Then, 

d{a) > d{a, [3^,) > d{a^,l3t,) + C(a — Q;*)^. 


E. Proofs of Auxiliary Results 

1) Lemma m Denote £(tT, Wcr, /?, u) the objective func¬ 
tion in and consider 0 < CTi < cr 2 < oo. Let 
be an optimal solution to the min-max problem in d2^ for 
(72. Then, let = argmax^_u £(o'i, u). 

Clearly, 

Using A/(Ti > A/cr 2 and 

£(cri,w^(2)^^(l)^u(l)) < /:(cr2,W^(2)^^(l)^u(l)) 





















But, 


< 0(cr2)- 

Combine the above chain of inequalities to conclude. 

n the statement 
> 0 (cf. Definition 


2) Lemma 1.3 
lemma. Also 


Let 


^-1 


n := map 
Notice that a*, /?* >0 and set 


be as in the statement of the 

(A) 


2.2i. 


0 < AT = 2a* < (X) and 0 < A = 2/3* < oo. 


As we have seen d{a, (3) is convex-concave. Also, the 
constraint sets are convex and compact, hence, 

(a*,/3*) = max min d(a,6). 
0</3<A0<a<if 

We have 0 < /3* < A and ) < m and D{Xp) 

continuous in (3 (cf. [1, Lem. 8.1]). Thus, there exists open 
neighborhood A/i C [0, A] such that D{Xp) < m,\//3 S Afi. 
Fix any such (3 G A/i and let 

dt,(/3) := min d(a,/3). (28) 

0<a<K 

Differentiating with respect to a, we find 


dd 

da 


a 


+ 1 



It can be checked that a»{f3) = y unique 

solution to the equation dd/da = 0. In particular, a* = 
at{i3t) and is feasible, i.e. a^.{f3t) G (0, K). From continuity 
of D{-) (cf. [1, Lem. 8.1]), we have a*(/3) be a continuous 
function of /?. Hence, there exists open neighborhood of 
/3*, say A /2 C Afi, such that a*(^) S (0, AT) for all 
(3 G A 2 - For any such (3 G A/ 2 , a*(/3) satisfies first-order 
optimality conditions of the convex minimization in ( |28l l, 
thus, is optimal; 

d*(/3) = -/3V2 +^Jm- Dixy, V/3eAA2. 

Wm V 


Differentiating this with respect to /3 finds: 

^ =-R^ 1 m- D{Xp) -C{Xp) 
dp ^/m y-m - D{Xp) 

where we have used dD{T)/dT = —{2/t)C{t),t > 0 (cf. 
[17, Lem. C.2]). Note that the second summand above is 
equal to ^map(A/(/3-\/TO)). With this, it is easy to verify that 
/3* is such that dd^/dp = 0. From ( [2^ , (i*(/3) is concave as 
the point-wise minimum of concave functions. Thus, first- 
order optimality conditions satisfied by /?* are sufficient, 
which completes the proof. 













